Video summarization attracts attention for efficient video representation, retrieval, and browsing to ease volume and traffic surge problems. Although video summarization mostly uses the visual channel for compaction, the benefits of audio-visual modeling appeared in recent literature. The information coming from the audio channel can be a result of audio-visual correlation in the video content. In this study, we propose a new audio-visual video summarization framework integrating four ways of audio-visual information fusion with GRU-based and attention-based networks. Furthermore, we investigate a new explainability methodology using audio-visual canonical correlation analysis (CCA) to better understand and explain the role of audio in the video summarization task. Experimental evaluations on the TVSum dataset attain F1 score and Kendall-tau score improvements for the audio-visual video summarization. Furthermore, splitting video content on TVSum and COGNIMUSE datasets based on audio-visual CCA as positively and negatively correlated videos yields a strong performance improvement over the positively correlated videos for audio-only and audio-visual video summarization.
translated by 谷歌翻译
预计无人驾驶航空公司(无人机)将在未来的通信系统中发挥重要作用。当您感兴趣区域的3D模型可用时,可以通过广泛的现场测量或射线跟踪模拟来完成无人机的最佳定位。在本文中,我们提出了一种替代方法来优化一个区域的UAV基站高度。该方法是基于深度学习;具体地,目标区域的2D卫星图像被输入到深度神经网络,以预测不同的UAV海拔的路径损耗分布。预测的路径分布用于计算该区域的覆盖范围;确定最佳高度,最大化覆盖范围。设计和培训神经网络,以在单个推理中产生多个路径损耗分布;因此,没有必要为每个高度训练单独的网络。
translated by 谷歌翻译
The field of autonomous mobile robots has undergone dramatic advancements over the past decades. Despite achieving important milestones, several challenges are yet to be addressed. Aggregating the achievements of the robotic community as survey papers is vital to keep the track of current state-of-the-art and the challenges that must be tackled in the future. This paper tries to provide a comprehensive review of autonomous mobile robots covering topics such as sensor types, mobile robot platforms, simulation tools, path planning and following, sensor fusion methods, obstacle avoidance, and SLAM. The urge to present a survey paper is twofold. First, autonomous navigation field evolves fast so writing survey papers regularly is crucial to keep the research community well-aware of the current status of this field. Second, deep learning methods have revolutionized many fields including autonomous navigation. Therefore, it is necessary to give an appropriate treatment of the role of deep learning in autonomous navigation as well which is covered in this paper. Future works and research gaps will also be discussed.
translated by 谷歌翻译
Multi-class ensemble classification remains a popular focus of investigation within the research community. The popularization of cloud services has sped up their adoption due to the ease of deploying large-scale machine-learning models. It has also drawn the attention of the industrial sector because of its ability to identify common problems in production. However, there are challenges to conform an ensemble classifier, namely a proper selection and effective training of the pool of classifiers, the definition of a proper architecture for multi-class classification, and uncertainty quantification of the ensemble classifier. The robustness and effectiveness of the ensemble classifier lie in the selection of the pool of classifiers, as well as in the learning process. Hence, the selection and the training procedure of the pool of classifiers play a crucial role. An (ensemble) classifier learns to detect the classes that were used during the supervised training. However, when injecting data with unknown conditions, the trained classifier will intend to predict the classes learned during the training. To this end, the uncertainty of the individual and ensemble classifier could be used to assess the learning capability. We present a novel approach for novel detection using ensemble classification and evidence theory. A pool selection strategy is presented to build a solid ensemble classifier. We present an architecture for multi-class ensemble classification and an approach to quantify the uncertainty of the individual classifiers and the ensemble classifier. We use uncertainty for the anomaly detection approach. Finally, we use the benchmark Tennessee Eastman to perform experiments to test the ensemble classifier's prediction and anomaly detection capabilities.
translated by 谷歌翻译
We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. For continuous observations, when this algorithm becomes impractical, we propose a latent variable model specific to the data generation process at hand. We show how the approach degrades as the size of the shift changes, and verify that it outperforms both covariate and label shift adjustment.
translated by 谷歌翻译
In intensively managed forests in Europe, where forests are divided into stands of small size and may show heterogeneity within stands, a high spatial resolution (10 - 20 meters) is arguably needed to capture the differences in canopy height. In this work, we developed a deep learning model based on multi-stream remote sensing measurements to create a high-resolution canopy height map over the "Landes de Gascogne" forest in France, a large maritime pine plantation of 13,000 km$^2$ with flat terrain and intensive management. This area is characterized by even-aged and mono-specific stands, of a typical length of a few hundred meters, harvested every 35 to 50 years. Our deep learning U-Net model uses multi-band images from Sentinel-1 and Sentinel-2 with composite time averages as input to predict tree height derived from GEDI waveforms. The evaluation is performed with external validation data from forest inventory plots and a stereo 3D reconstruction model based on Skysat imagery available at specific locations. We trained seven different U-net models based on a combination of Sentinel-1 and Sentinel-2 bands to evaluate the importance of each instrument in the dominant height retrieval. The model outputs allow us to generate a 10 m resolution canopy height map of the whole "Landes de Gascogne" forest area for 2020 with a mean absolute error of 2.02 m on the Test dataset. The best predictions were obtained using all available satellite layers from Sentinel-1 and Sentinel-2 but using only one satellite source also provided good predictions. For all validation datasets in coniferous forests, our model showed better metrics than previous canopy height models available in the same region.
translated by 谷歌翻译
Cancer is one of the most challenging diseases because of its complexity, variability, and diversity of causes. It has been one of the major research topics over the past decades, yet it is still poorly understood. To this end, multifaceted therapeutic frameworks are indispensable. \emph{Anticancer peptides} (ACPs) are the most promising treatment option, but their large-scale identification and synthesis require reliable prediction methods, which is still a problem. In this paper, we present an intuitive classification strategy that differs from the traditional \emph{black box} method and is based on the well-known statistical theory of \emph{sparse-representation classification} (SRC). Specifically, we create over-complete dictionary matrices by embedding the \emph{composition of the K-spaced amino acid pairs} (CKSAAP). Unlike the traditional SRC frameworks, we use an efficient \emph{matching pursuit} solver instead of the computationally expensive \emph{basis pursuit} solver in this strategy. Furthermore, the \emph{kernel principal component analysis} (KPCA) is employed to cope with non-linearity and dimension reduction of the feature space whereas the \emph{synthetic minority oversampling technique} (SMOTE) is used to balance the dictionary. The proposed method is evaluated on two benchmark datasets for well-known statistical parameters and is found to outperform the existing methods. The results show the highest sensitivity with the most balanced accuracy, which might be beneficial in understanding structural and chemical aspects and developing new ACPs. The Google-Colab implementation of the proposed method is available at the author's GitHub page (\href{https://github.com/ehtisham-Fazal/ACP-Kernel-SRC}{https://github.com/ehtisham-fazal/ACP-Kernel-SRC}).
translated by 谷歌翻译
Fisher's criterion is a widely used tool in machine learning for feature selection. For large search spaces, Fisher's criterion can provide a scalable solution to select features. A challenging limitation of Fisher's criterion, however, is that it performs poorly when mean values of class-conditional distributions are close to each other. Motivated by this challenge, we propose an extension of Fisher's criterion to overcome this limitation. The proposed extension utilizes the available heteroscedasticity of class-conditional distributions to distinguish one class from another. Additionally, we describe how our theoretical results can be casted into a neural network framework, and conduct a proof-of-concept experiment to demonstrate the viability of our approach to solve classification problems.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译
Adequate strategizing of agents behaviors is essential to solving cooperative MARL problems. One intuitively beneficial yet uncommon method in this domain is predicting agents future behaviors and planning accordingly. Leveraging this point, we propose a two-level hierarchical architecture that combines a novel information-theoretic objective with a trajectory prediction model to learn a strategy. To this end, we introduce a latent policy that learns two types of latent strategies: individual $z_A$, and relational $z_R$ using a modified Graph Attention Network module to extract interaction features. We encourage each agent to behave according to the strategy by conditioning its local $Q$ functions on $z_A$, and we further equip agents with a shared $Q$ function that conditions on $z_R$. Additionally, we introduce two regularizers to allow predicted trajectories to be accurate and rewarding. Empirical results on Google Research Football (GRF) and StarCraft (SC) II micromanagement tasks show that our method establishes a new state of the art being, to the best of our knowledge, the first MARL algorithm to solve all super hard SC II scenarios as well as the GRF full game with a win rate higher than $95\%$, thus outperforming all existing methods. Videos and brief overview of the methods and results are available at: https://sites.google.com/view/hier-strats-marl/home.
translated by 谷歌翻译